59 research outputs found
Fast Convergence of Belief Propagation to Global Optima: Beyond Correlation Decay
Belief propagation is a fundamental message-passing algorithm for
probabilistic reasoning and inference in graphical models. While it is known to
be exact on trees, in most applications belief propagation is run on graphs
with cycles. Understanding the behavior of "loopy" belief propagation has been
a major challenge for researchers in machine learning, and positive convergence
results for BP are known under strong assumptions which imply the underlying
graphical model exhibits decay of correlations. We show that under a natural
initialization, BP converges quickly to the global optimum of the Bethe free
energy for Ising models on arbitrary graphs, as long as the Ising model is
\emph{ferromagnetic} (i.e. neighbors prefer to be aligned). This holds even
though such models can exhibit long range correlations and may have multiple
suboptimal BP fixed points. We also show an analogous result for iterating the
(naive) mean-field equations; perhaps surprisingly, both results are
dimension-free in the sense that a constant number of iterations already
provides a good estimate to the Bethe/mean-field free energy.Comment: 24 pages; comments welcome
Information Theoretic Properties of Markov Random Fields, and their Algorithmic Applications
Markov random fields area popular model for high-dimensional probability
distributions. Over the years, many mathematical, statistical and algorithmic
problems on them have been studied. Until recently, the only known algorithms
for provably learning them relied on exhaustive search, correlation decay or
various incoherence assumptions. Bresler gave an algorithm for learning general
Ising models on bounded degree graphs. His approach was based on a structural
result about mutual information in Ising models.
Here we take a more conceptual approach to proving lower bounds on the mutual
information through setting up an appropriate zero-sum game. Our proof
generalizes well beyond Ising models, to arbitrary Markov random fields with
higher order interactions. As an application, we obtain algorithms for learning
Markov random fields on bounded degree graphs on nodes with -order
interactions in time and sample complexity. The sample
complexity is information theoretically optimal up to the dependence on the
maximum degree. The running time is nearly optimal under standard conjectures
about the hardness of learning parity with noise.Comment: 25 page
The Vertex Sample Complexity of Free Energy is Polynomial
We study the following question: given a massive Markov random field on
nodes, can a small sample from it provide a rough approximation to the free
energy ?
Results in graph limit literature by Borgs, Chayes, Lov\'asz, S\'os, and
Vesztergombi show that for Ising models on nodes and interactions of
strength , an approximation to can be
achieved by sampling a randomly induced model on nodes.
We show that the sampling complexity of this problem is {\em polynomial in}
. We further show a polynomial dependence on cannot be
avoided.
Our results are very general as they apply to higher order Markov random
fields. For Markov random fields of order , we obtain an algorithm that
achieves approximation using a number of samples polynomial in
and and running time that is up to
polynomial factors in and . For ferromagnetic Ising models, the
running time is polynomial in .
Our results are intimately connected to recent research on the regularity
lemma and property testing, where the interest is in finding which properties
can tested within error in time polynomial in . In
particular, our proofs build on results from a recent work by Alon, de la Vega,
Kannan and Karpinski, who also introduced the notion of polynomial vertex
sample complexity. Another critical ingredient of the proof is an effective
bound by the authors of the paper relating the variational free energy and the
free energy.Comment: arXiv admin note: text overlap with arXiv:1802.06126 Updated
bibliograph
Approximating Partition Functions in Constant Time
We study approximations of the partition function of dense graphical models.
Partition functions of graphical models play a fundamental role is statistical
physics, in statistics and in machine learning. Two of the main methods for
approximating the partition function are Markov Chain Monte Carlo and
Variational Methods. An impressive body of work in mathematics, physics and
theoretical computer science provides conditions under which Markov Chain Monte
Carlo methods converge in polynomial time. These methods often lead to
polynomial time approximation algorithms for the partition function in cases
where the underlying model exhibits correlation decay. There are very few
theoretical guarantees for the performance of variational methods. One
exception is recent results by Risteski (2016) who considered dense graphical
models and showed that using variational methods, it is possible to find an
additive approximation to the log partition function in time
even in a regime where correlation decay does not hold.
We show that under essentially the same conditions, an
additive approximation of the log partition function can be found in constant
time, independent of . In particular, our results cover dense Ising and
Potts models as well as dense graphical models with -wise interaction. They
also apply for low threshold rank models.Comment: This preprint is completely subsumed by preprints arXiv:1802.06126
and arXiv:1802.06129 by the same authors which also include important
references that are missing in the current preprin
Mean-field approximation, convex hierarchies, and the optimality of correlation rounding: a unified perspective
The free energy is a key quantity of interest in Ising models, but
unfortunately, computing it in general is computationally intractable. Two
popular (variational) approximation schemes for estimating the free energy of
general Ising models (in particular, even in regimes where correlation decay
does not hold) are: (i) the mean-field approximation with roots in statistical
physics, which estimates the free energy from below, and (ii) hierarchies of
convex relaxations with roots in theoretical computer science, which estimate
the free energy from above. We show, surprisingly, that the tight regime for
both methods to compute the free energy to leading order is identical.
More precisely, we show that the mean-field approximation is within
of the free energy, where denotes the
Frobenius norm of the interaction matrix of the Ising model. This
simultaneously subsumes both the breakthrough work of Basak and Mukherjee, who
showed the tight result that the mean-field approximation is within
whenever , as well as the work of Jain, Koehler, and
Mossel, who gave the previously best known non-asymptotic bound of
. We give a simple, algorithmic
proof of this result using a convex relaxation proposed by Risteski based on
the Sherali-Adams hierarchy, automatically giving sub-exponential time
approximation schemes for the free energy in this entire regime. Our
algorithmic result is tight under Gap-ETH.
We furthermore combine our techniques with spin glass theory to prove (in a
strong sense) the optimality of correlation rounding, refuting a recent
conjecture of Allen, O'Donnell, and Zhou. Finally, we give the tight
generalization of all of these results to -MRFs, capturing as a special case
previous work on approximating MAX--CSP.Comment: This version: minor formatting changes, added grant acknowledgement
Learning Some Popular Gaussian Graphical Models without Condition Number Bounds
Gaussian Graphical Models (GGMs) have wide-ranging applications in machine
learning and the natural and social sciences. In most of the settings in which
they are applied, the number of observed samples is much smaller than the
dimension and they are assumed to be sparse. While there are a variety of
algorithms (e.g. Graphical Lasso, CLIME) that provably recover the graph
structure with a logarithmic number of samples, they assume various conditions
that require the precision matrix to be in some sense well-conditioned.
Here we give the first polynomial-time algorithms for learning attractive
GGMs and walk-summable GGMs with a logarithmic number of samples without any
such assumptions. In particular, our algorithms can tolerate strong
dependencies among the variables. Our result for structure recovery in
walk-summable GGMs is derived from a more general result for efficient sparse
linear regression in walk-summable models without any norm dependencies. We
complement our results with experiments showing that many existing algorithms
fail even in some simple settings where there are long dependency chains,
whereas ours do not.Comment: V2: Updated version with some new result
Accuracy-Memory Tradeoffs and Phase Transitions in Belief Propagation
The analysis of Belief Propagation and other algorithms for the {\em
reconstruction problem} plays a key role in the analysis of community detection
in inference on graphs, phylogenetic reconstruction in bioinformatics, and the
cavity method in statistical physics.
We prove a conjecture of Evans, Kenyon, Peres, and Schulman (2000) which
states that any bounded memory message passing algorithm is statistically much
weaker than Belief Propagation for the reconstruction problem. More formally,
any recursive algorithm with bounded memory for the reconstruction problem on
the trees with the binary symmetric channel has a phase transition strictly
below the Belief Propagation threshold, also known as the Kesten-Stigum bound.
The proof combines in novel fashion tools from recursive reconstruction,
information theory, and optimal transport, and also establishes an asymptotic
normality result for BP and other message-passing algorithms near the critical
threshold.Comment: To be presented on COLT 201
Representational Power of ReLU Networks and Polynomial Kernels: Beyond Worst-Case Analysis
There has been a large amount of interest, both in the past and particularly
recently, into the power of different families of universal approximators, e.g.
ReLU networks, polynomials, rational functions. However, current research has
focused almost exclusively on understanding this problem in a worst-case
setting, e.g. bounding the error of the best infinity-norm approximation in a
box. In this setting a high-degree polynomial is required to even approximate a
single ReLU.
However, in real applications with high dimensional data we expect it is only
important to approximate the desired function well on certain relevant parts of
its domain. With this motivation, we analyze the ability of neural networks and
polynomial kernels of bounded degree to achieve good statistical performance on
a simple, natural inference problem with sparse latent structure. We give
almost-tight bounds on the performance of both neural networks and low degree
polynomials for this problem. Our bounds for polynomials involve new techniques
which may be of independent interest and show major qualitative differences
with what is known in the worst-case setting
A Phase Transition in Arrow's Theorem
Arrow's Theorem concerns a fundamental problem in social choice theory: given
the individual preferences of members of a group, how can they be aggregated to
form rational group preferences? Arrow showed that in an election between three
or more candidates, there are situations where any voting rule satisfying a
small list of natural "fairness" axioms must produce an apparently irrational
intransitive outcome. Furthermore, quantitative versions of Arrow's Theorem in
the literature show that when voters choose rankings in an i.i.d.\ fashion, the
outcome is intransitive with non-negligible probability.
It is natural to ask if such a quantitative version of Arrow's Theorem holds
for non-i.i.d.\ models. To answer this question, we study Arrow's Theorem under
a natural non-i.i.d.\ model of voters inspired by canonical models in
statistical physics; indeed, a version of this model was previously introduced
by Raffaelli and Marsili in the physics literature. This model has a parameter,
temperature, that prescribes the correlation between different voters. We show
that the behavior of Arrow's Theorem in this model undergoes a striking phase
transition: in the entire high temperature regime of the model, a Quantitative
Arrow's Theorem holds showing that the probability of paradox for any voting
rule satisfying the axioms is non-negligible; this is tight because the
probability of paradox under pairwise majority goes to zero when approaching
the critical temperature, and becomes exponentially small in the number of
voters beyond it. We prove this occurs in another natural model of correlated
voters and conjecture this phenomena is quite general.Comment: 48 pages; comments welcome
A Spectral Condition for Spectral Gap: Fast Mixing in High-Temperature Ising Models
We prove that Ising models on the hypercube with general quadratic
interactions satisfy a Poincar\'{e} inequality with respect to the natural
Dirichlet form corresponding to Glauber dynamics, as soon as the operator norm
of the interaction matrix is smaller than . The inequality implies a control
on the mixing time of the Glauber dynamics. Our techniques rely on a
localization procedure which establishes a structural result, stating that
Ising measures may be decomposed into a mixture of measures with quadratic
potentials of rank one, and provides a framework for proving concentration
bounds for high temperature Ising models.Comment: Preliminary versio
- β¦